Robots Can Speak Authentic Dialects! The First Mandarin-Dialect Mixed Speech TTS Model Bailing-TTS is Here
The Bailing-TTS technology marks a significant breakthrough in the field of dialect speech synthesis, achieving efficient conversion from text to near-human-level Chinese dialect speech through a multi-layer autoregressive transformer model trained on a large-scale dialect dataset. This technology employs a continuous semi-supervised learning strategy combined with a dialect-specific mixture of experts network architecture and multi-stage training strategies, significantly enhancing the naturalness and quality of the generated speech. Research shows that the speech generated by Bailing-TTS performs excellently across various dialects, with broad application prospects, such as